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We have assigned six members of the human p-actin niultigene family to specific human chromosomes. The 
functionaL^^ the other assi S ned P-actin-related sequences 

anTSsiH^ different chromosomes including one locus assigned to the X chromosome. 

Using intervening sequence probes, we showed that the functional gene is single copy and that all of the other 
6-actin related sequences are recently generated in evolution and are probably processed pseudogenes. The 
entire nucleotide sequence of the functional gene has been determined and is identical to cDNA clones in the 
coding and 5' untranslated regions. We have previously reported that the 3' untranslated region is well 
conserved between humans and rats (Ponte et al., Nucleic Acids Res. 12:1687-1696, 1984). Now we report that 
fpiir additional noncoding regions are evolutionary conserved, including segments of the 5 flanking region, 
5' untranslated region, and, surprisingly, intervening sequences I and III. These conserved sequences, 
especially those found in the introns, suggest a role for internal sequences in the regulation of p-actin gene 
expression. 



Cytoskeletal p-actin is one of the most abundant cellular 
proteins found in mammalian and avian nonmuscle cells. It is 
the major component, together with cytoskeletal 7-actin, of 
the microfilamentous structures found in these cells (4). 
Functionally, cytoskeletal actin has been implicated in intra- 
cellular movement of organelles, cytokinesis, and cell motil- 
ity (55). In addition, sequential mutations in the p-actin 
protein have been associated with a parallel increase in the 
tumorigenicity of human cells (25). However, the precise 
mechanism of p-actin function in these cellular processes 
remains enigmatic. 

One approach to a better understanding of the functions of 
p-actin involves the reintroduction of specifically modified 
p-actin genes into nonmuscle cells to observe the effects of 
the programmed alterations. This approach first required 
isolation and characterization of the gene. In addition, since 
the human p-actin gene is a member of a large multigene 
family, a prerequisite to interpreting the results of such 
studies requires a full accounting of the number of functional 
genes. There are at least 20 different p-actin gene sequences 
in the human genome (41), and similarly sized familes are 
found in the rat and mouse (4i; Gunning, unpublished data). 
Sequence analysis has demonstrated that several of the 
human p-actin genes are processed pseudogenes (34; J. N. 
Engel, Ph.D. thesis, Stanford University, Palo Alto, Calif., 
1982). This, in addition to other data, has led us to propose 
that the majority of the human (and rodent) p-actin genes are 
in fact pseudogenes (41). However, it is difficult to exclude 
the possibility that more than one functional p-actin gene 
exists. Nucleotide sequencing of all the human p-actin genes 
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would solve this problem, but is impractical. We have 
therefore sought an alternative strategy to address these 
issues. 

As a first step, we recently isolated an intron-containing 
human p-actin gene that is expressed in human fibroblasts 
(26). In this report we describe our studies of the chromo- 
somal distribution of this gene and the other related p-actin 
sequences and present a complete structural analysis for the 
human p-actin gene. We demonstrate that there is but a 
single functional human p-actin gene, that it is located on 
chromosome 7, and that the remaining copies are intronless, 
have been generated recently, and have been randomly 
integrated into the genome. 

We have determined the entire nucleotide sequence of the 
human p-actin gene and compared it with the corresponding 
sequences of the rat and chicken genes. Several noncoding 
regions of these genes, including the 5' flanking regions and 
two of the five introns, are under strong evolutionary pres- 
sure to retain specific sequences. Conserved segments of an 
intron in the 5' untranslated region (UTR) resemble potential 
transcription enhancer elements. This strong sequence con- 
servation may reflect evolutionary pressure to retain a 
particularly strong transcription promoter function, since 
P-actin is one of the most abundant cellular transcripts in 
mammals and birds. 

MATERIALS AND METHODS 

Chromosome mapping. Parental cell lines, construction of 
human-mouse somatic cell hybrids, and identification ot 
human chromosomes within the hybrids have been described 
previously (15, 47). DNA was isolated from hybrid cells and 
their parents, digested with EcoRI, size fractionated on 0.8/<j 
agarose gels, and transferred to nitrocellulose as described 
previously (35). Two DNA probes were used for chromosome 
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mapping. A 1.5-kiIobase (kb) Sacl DNA fragment was 
isolated from the functional human p-actin gene (26) and used 
as a probe to specifically detect this gene. The second probe 
was derived from the 3' UTR of the human p-actin cDNA as 
a 118-base-pair Haelll DNA fragment which was used to 
specifically detect all of the human p-actin-related sequences 
Construction of 3' UTR subclone. Three Haelll DNA 
fragments were isolated from the previously subcloned 3' 
UTR of a human p-actin cDNA (pHFpA-3'UT) (41). Frag- 
ment 1 extends from codon 366 to base 211 in the 3' UTR- 
fragment 2 covers base 212 to base 329 in the 3' UTR, and 
fragment 3 extends from base 330 through the end of the 3' 
UTR (base 594) to the vector BamHl site. The middle 
118-base-pair Haelll DNA fragment was directly cloned into 
the Smal site of pHP34 (43). This subclone is denoted 
pHFpA-3'UT-HH. For convenience, the probe derived from 
this subclone is referred to as human p-actin specific 
(HPAS). The DNA fragment was isolated from this subclone 
by digestion with EcoRl, and the purified fragment was 
self-ligated by incubation with T4 iigase (15) before nick 
translation. 

Construction of intervening sequence (IVS) region 
subclones. Restriction fragments of the human p-actin gene 
clone pMl(pi)-2 were subcloned by blunt-end ligation of the 
appropriate DNA fragments into the Smal site of pHP34 (43) 
as described previously (41). Restriction fragments with 
protruding 5' ends were converted to blunt ends by using the 
Klenow fragment of Escherichia coli DNA polymerase I 
Restriction fragments with protruding 3' ends were con- 
verted to blunt ends by using the 3' exonuclease activity of 
bacteriophage T4 DNA polymerase. The subcloned DNA 
fragments (see Fig. 3 for map locations) were recovered from 
the plasmids after digestion with the endoriuclease EcoRl. 
The fragments were purified by electrophoresis and self- 
ligated before nick translation. 

DNA isolation and Southern blot analysis. DNA was iso- 
lated from HeLa cells and digested with various restriction 
enzymes as described previously (7). Digested DNA was 
size fractionated on 0.7% agarose gels (7) and transferred to 
nitrocellulose (50). Nick-translated DNA fragments (45), 10 8 
dpm/^g, were hybridized to nitrocellulose filters and washed 
exactly as described previously (15). 

Nuclease SI mapping. The 400-base-pair BstNl-BstNl 
DNA fragment spanning the putative mRNA cap site was 
end labeled with [ y -^P]ATP (ICN Pharmaceuticals Inc.) in 
the presence of polynucleotide kinase (New England Nu- 
clear Corp.) according to standard procedures (30). The 
end-labeled DNA fragment was then digested with endonu- 
clease XhoL A 127-base-pair Xhol-BstNl* DNA fragment 

K7, d nn 0m an a S arose & el with Schleicher & Schuell 
NA-45 DEAE membrane, was divided into two samples 
ine first sample was used for DNA sequencing (30) The 
second sample, after denaturation at 80°C, was hybridized to 
total RNA from human cells at 60°C for 3 h in hybridization 
buffer containing 80% formaniide, 40 mM piperazine-/V N'- 
ta(2-ethanesulfonic acid) ( P H 6.4), 0.4 M sodium chloride 
and 1 mM EDTA (pH 8.0). After hybridization, ice-cold 
nuclease SI buffer and 1,000 U of nuclease SI (Sigma 
^nemical Co.) were added to each sample immediately 
After incubation at 37°C for 15 min, the reaction was stopped 
?L a addltlon of ammonium acetate and EDTA. Protected 
U NA was precipitated with an equal volume of isopropanol 
S h . ed With 70% ethan °l, vacuum dried, and dissolved in 
m formamide, 0.05% bromphenol blue-xylene cyanol, and 
iniM EDTA. Samples were elctrophoresed at 40 V cm" 1 on 




FIG. 1. Detection of the functional human B-actin gene (ACTB) 
in human-mouse somatic cell hybrids. DNA was isolated from 
somatic cell hybrids (lanes 1 through 4) and the parental mouse (lane 
5) and human (lane 6) cell lines. After digestion with EcoRL the 
DNA was electrophoresed on a 0.8% agarose gel, transferred to 
nitroce lulose and hybridized with the human p-actin gene-specific 
probe (Sad fragment, see Materials and Methods). The human 
p-actm gene is located on a 14-kb DNA fragment (indicated by the 
arrow), and the mouse DNA fragment migrating more slowly on the 
gel is approximately 20 kb in size. 

8% acrylamide-8 mM urea thin sequencing gels in parallel 
with chemical sequencing cieavage fragments. 

DNA sequencing. Restriction fragments were sequenced 
by the method of Maxam and Gilbert (30). Sequence data 
were then managed with the GEL program (IntelliGenetics, 
Inc.). 

DNA sequence alignments. Comparison of DNA sequence 
data was managed with the IFIND program (IntelliGenetics, 
Inc.) based on the Wilbur and Lipman algorithm (56). AH 
DNA sequence alignments were carried out with a cap 
penalty setting of 4. 

* 

RESULTS 

Chromosomal location of the human p-actin gene. We have 
previously reported the isolation of the human p-actin gene 
as a ttAN7 recombinant from a bacteriophage library of 
human fetal DNA and demonstrated that it is an expressed 
gene (26). This functional human p-actin gene is located on a 
14-kb EcoRl DNA fragment in the human genome. A 6.5-kb 
fragment of the phage clone, containing the complete coding 
region and about 2 kb of 5' flanking DNA, has been 
subcloned into the EcoRl site of pBR322. We derived a 
1.5-kb Sacl DNA fragment from the cloned gene which 
covers the region from about 450 to 2,000 base pairs 5' of the 
mRNA cap site. This DNA fragment hybridizes to a 14-kb 
DNA band in EcoRI-digested human DNA and to a 20-kb 
DNA band in £c<?RI-digested mouse DNA (lanes 5 and 6, 
Fig. 1). Using this DNA probe, we were able to follow the 
segregation of the human p-actin gene with human chromo- 
somes in human-mouse somatic cell hybrids. 

Thirty-two human-mouse somatic cell hybrids were tested 
for the coordinate presence of the human p-actin gene 
(ACTB) and a specific human chromosome and chromo- 
some-specific isozyme markers. After electrophoresis and 
Southern blotting of genomic cell hybrid DNA, the hybrids 
were scored for the presence or absence of the human gene 
(Fig. 1). Whereas all hybrid cell lines displayed a 20-kb 
mouse p-actin DNA fragment (lanes 1 through 4, Fig 1) 
only a subset contained the 14-kb human ACTB band (lanes 
1, 2, and 4, Fig. 1). Table 1 shows that in cell hybrids with 
different numbers and combinations of human chromo- 
somes, all human chromosomes except chromosome 7 seg- 
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TABLE 1. Distribution of ACtB and human chromosomes in human-mouse cell hybrids 
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a Human-mouse somatic celt hybrids were isolated and characterized as described previously (47). 

b The (j-actin probe recognizing functional gene sequences was scored in cell hybrid genomic DNA after EcdRl digestion. 
c Human chromosomes and enzynie markers already assigned to specific chromosomes were tested for all cell hybrids. 

d Translocation chromosomes were well characterized rearrangements derived frpm reciprocal translocations observed in human parental cells. 



regated discordantly with ACTB. The cell hybrid DUA-1 
CSAzF, which retained only human chromosome 7 on a 
mouse background, was positive for ACTB, as was cell 
hybrid JSR-17 (48), which retained a 7/9 translocation 
(7pter~>7q22::9p24-»9pter) and the 7pter-»7q22 region. This _ 
locates the A CTB gene iri the pter-^q22 region on human 
chromoson jeTT ; ~" 

Chromosomal distribution of human p-actin-related se- 
quences. The human genorrie contains at least 19 p-actin- 
related sequences in addition to the functional gene. It is of 
particular interest to determine the origin of these se- 
quences. The two most likely explanations involve either 
duplication^ of the functional gene or generation of reverse- 
transcript processed pseudogenes. In the latter case proc- 
essed genes might also be duplicated to further expand the 
gene family. Evaluation of the contribution of tandem dupli- 
cation to the generation of the human p-actin gene family can 
be obtained from chromosome linkage analysis. If groups of 
these p-actin sequences are closely linked, then it is likely 
that they have resulted from tandem duplications. Conr 
versely, if few or no p-actin gene sequences are linked* it is 
more likely that tandem duplications have had little or no 
role in the generation of this gene family. Earlier investiga- 
tions (49) regarding the chromosomal dispersion of actin 
genes, relying on in situ hybridization, had concluded that 
the genes were dispersed. Since the probes used in those 
experiments detected all actin-coding sequences and did not 



distinguish among the actin isptypes, the issue of isdtype 
clustering versus dispersal could not be addressed. 

We examined the segregation of these additional, p-actin 
sequences in human-mouse somatic cells containing dif- 
ferent human chromosomes. To accomplish this we con- 
structed a DNA probe that recognizes all of the human 
p-actin sequences but not those of the mouse. Since the 3' 
UTR of the human p-actin gene is strongly conserved 
between humans arid rodents (41, 42), DNA from the com- 
plete 3' UTR did not discriminate between the human and 
mouse sequences. We therefore cut the human p-actin 3' 
UTR into several DNA fragments by using the restriction 
endortuclease Haelll and examined the ability of the result- 
ing fragments to hybridize with EcoRI-digested human and 
mouse DNA. Probes derived from the 5' arid 3' ends of the 
human p-actih 3' UTR hybridized strongly to all of the 
human and mouse p-actin genomic sequences (data not 
shown). However, a 118-base-pair Haelll DNA fragment 
derived from the middle of the p-actin 3' UTR hybridized to 
all 20 human EcoRl p-actin gene sequences, but not to any 
derived from the mouse DNA (Fig. 2, lanes R and S). We 
designated this human p-actin-specific probe as HpAS. 

The results of this experiment not only provided probes 
that would allow us to address the chromosome distribution 
of the p-actin-reiated sequences, but also provided an im- 
portant insight into the evolution of these sequences. Since 
HpAS hybridizes to all the human p-actin sequences but not 
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FIG. 2. Segregation of human p-actin gene sequences between 
human-mouse somatic cell hybrids. DNA was isolated from somatic 
cell hybrids (lanes A through Q), and the parental mouse (lane R) 
and human (lane S) cell lines. After digestion with EcoRI, the DNA 
was electrophoresed on a 0.8% agarose gel, blotted to nitrocellulose, 
and hybridized with the radiolabeled human-specific p-actin-specific 
DNA probe, Hp AS. This probe recognizes 20 different-sized DNA 
fragments, and their numerical designation is shown on the right. 
The autoradiogram is the result of a 3-week exposure to Kodak 
XAR-5 film at -80°C, except for lane S, which was exposed for only 
4 days. 

those of the rodent, it is evident that ail 20 genomic human 
p-actin sequences are more closely related to the functional 
human gene than they are to the mouse genes. This, in turn, 
strongly suggests that these copies of human 0-actin se- 
quences have been generated since the divergence of hu- 
mans and rodents. 

Over 30 different human-mouse somatic cell hybrid cell 
lines, each carrying a different subset of human chromo- 
somes, were then assayed for the presence of the human 
P-actin-related sequences. DNA prepared from these cell 
hybrids plus the parental human and mouse cells was di- 
gested with EcoRI, size fractionated, and hybridized with 
the HpAS DNA fragment. Hybridization to a panel of 
human-mouse cell hybrid DNAs reveals complex banding 
patterns which differ between the different cell hybrids (Fig. 
2, lanes A to Q). Pairwise comparisons of all these p-actin 
bands demonstrated that very few of them show any possible 
linkage (data not shown). 

Each of the hybridizing human DNA fragments was ex- 
amined for its cosegregation in cell hybrids with a specific 
human chromosome. We were able to determine for six of 
the human p-actin EcoRI fragments the percent discordancy 



for each chromosome among the various human chromo- 
somes in the hybrid cell panels, and from these data six 
definitive chromosome assignments can be made (Table 2). 
The data confirm the assignment of band 5 (the functional 
gene, ACTB) to chromosome 7 and localize it to the 
7pter— »q22 region. The band 17 p-actin sequence also 
cosegregates with human chromosome 7. However, the 
band 17 sequence was not closely linked to ACTB, since its 
locus can be assigned to the 7q22— »qter region. 

At least one p-actin sequence, band 1, cosegregates with 
the X chromosome. Using a panel of X-autosomal transloca- 
tions segregating in cell hybrids, we further delimited the 
location of band 1 to the Xql3-*q22 region (data not shown). 
Bands 7 and 13 represent sequences that both segregate only 
with chromosome 5. Band 8 cosegregated only with chro- 
mosome 18. 

Thus we have assigned 6 of the 20 human p-actin EcoRI 
fragments to specific chromosomes and, in several cases, to 
specific subchromosomal regions. We have not yet been able 
to assign the remaining p-actin sequences (bands) to specific 
chromosomes because of the complexities of their patterns 
and the possibility that some of the bands might represent 
comigrating fragments. However, there is little cosegrega- 
tion of any of the p-actin gene sequences, and only one (band 
17) cosegregates with the functional gene (band 5). This 
demonstrates that the majority of these p-actin-related se- 
quences are not closely linked but, rather, are dispersed 
throughout the human genome. Since a number of these cell 
lines contain breaks in some of the human chromosomes, 
lack of cosegregation does not preclude location on the same 
chromosome, albeit at widely separated loci such as found 
with band 17 and the functional gene. Thus a set of 20 DNA 
sequences, apparently derived from the human p-actin gene 
some time after the divergence of mice and humans, is 
dispersed over the chromosomal landscape. 

Sequence and organization of the human P-actin gene. To 
analyze the p-actin gene and examine its relationship to 
other actin genes, we have determined its nucleotide se- 
quence. Figure 3 includes the sequencing strategy we used. 
Comparison of the genomic sequence with the cDNA se- 
quence previously reported (42) enabled us to construct the 
structural organization of the human p-actin gene. The 
similarity of intron lengths between human and rat p-actin 
genes (37) has facilitated the mapping of these regions. The 
intron lengths for human, rat, and chicken p-actin genes are 
respectively 832,927, and 903 base pairs (IVS I); 134,87, and 
320 base pairs (IVS II); '441,464, and 524 base pairs (IVS III); 
95,88, and 306 base pairs (IVS IV); and 112, 124, and 355 
base pairs (IVS V) (22, 37). 



TABLE 2. Chromosome assignment of p-actin sequences* 
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* Chromosome assignment panels consisted of at least 30 cell hybrids for each p-actin sequence. The panels did not always contain the same cell hybrids. For 
each band, the percent discordancy is listed for each chromosome. If a chromosome is present and a band is not (or the reverse), then the percent discordancy is 
indicated for the total number of hybrids in that panel. If a band and a chromosome cosegregate together in cell hybrids, then there is no discordancy ("0'* 
discordancy) demonstrating that the p-actin sequence is encoded on the specific chromosome. 

b Human chromosomes were identified by both karyotyping and previously mapped enzyme markers. 

c With well-characterized translocation chromosomes, certain p-actin sequences segregated with specific chromosomal regions. Chromosome nomenclature 
followed that of the Paris Conference (34a). 
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FIG. 3. Strategy employed for the determination of the nucleotide sequence of human p-actin gene in pMlpl-2. A restriction map of the 
6.5-kb insert in pMipi-2 has been published by Leavitt et ai. (26). The clone has about 2 kb of DNA 5' of the mRNA cap site and about 1.5 
kb of 3' UTR and ttAN 7 sequences downstream of the translation termination codon. Protein-coding regions are depicted by solid boxes, 
UTRs are depicted by hatched boxes, and noncoding regions are depicted by empty boxes. The map shown only indicates the sites used for 
sequencing and does not represent a complete restriction map of every enzyme indicated. The map positions of IVS I and IVS III subclones 
from which intron probes are derived are indicated. These three IVS subclones are named pHpA-IVS I (Sph-Dde), pH0A-IVS I {Bgll), and 
pH0A-IVS III, respectively. 
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The DNA sequence of the 2,826 base pairs of the human 
p-actin gene plus 240 base pairs of its 5' flanking region is 
presented in Fig. 4. This transcription unit consists of 84 
base pairs of 5' UTR, 1,128 base pairs of protein-coding 
sequences interrupted by five introns (1,614 base pairs total), 
and about 591 base pairs of 3' untranslated region. We have 
not included a complete DNA sequence of the region down- 
stream of the termination codon since this gene was isolated 
as a ttAN7 recombinant. The sequences in this region have 
been reported previously (42) and are inferred from the 3' 
UTR of the p-actin cDNA clone. 

The functional p-actin gene is single copy in the human 
genome. When a DNA fragment derived from the 3' UTR of 
p-actin was used as a hybridization probe in genomic South- 
ern blot experiments, we detect about 20 human genomic 
DNA fragments (41) (Fig. 2). Several of these p-actin se- 
quences are known to be processed pseudogenes, as has 
been found by molecular cloning and DNA sequence analy- 
sis (33, 34; Engel, Ph.D. thesis). Accordingly we used DNA 
segments derived from IVS I and IVS III of the functional 
gene as hybridization probes to determine how many of the 
20 p-actin coding sequences also contained intron se- 
quences. 

The probes (Fig. 3) were hybridized to genomic DNA 
cleaved with a variety of restriction endonucleases. The 
results with the IVS III probe are presented in Fig. 5. The 
number of hybridizing DNA fragments in each digest is 
consistent with the presence of but a single p-actin gene in 
the genome. Pstl and Sad each generate two hybridizing 
fragments, as expected since they each cut once in the intron 
probe used in the experiment. The IVS I hybridization 
probe, derived from an Sphl-Ddel fragment near the 5' end 
of IVS I (Fig. 3), also hybridizes to single genomic DNA 
fragments (data not shown). 

The size of the EcoRl genomic fragment that hybridizes in 
Fig. 5 is about 14 kb. Our previous results, obtained with a 



hybridization probe containing 5' region sequences (26), 
revealed two EcoRl hybridizing fragments of 14 and 6.4 kb. 
The 6.4-kb EcoRl fragment (band 11, Fig. 2) has been cloned 
and is known to be nonfunctional, based on preliminary 
DNA sequence analysis (Engel, Ph.D. thesis). This earlier 
result had left open the possibility that the 6.4-kb band might 
be an intron-bearing pseudogene. The results presented here 
exclude that possibility and demonstrate that the p-actin- 
coding sequence on the 6.4-kb EcoRl fragment lacks introns. 

In sum, the results from genomic hybridization experi- 
ments with two different hybridization probes derived from 
two different regions of the p-actin gene demonstrate that 
there is a single chromosomal locus for the human p-actin 
gene and that the p-actin-related sequence of the 19 other 
EcoRl fragments are. all recently generated, intronless, and 
dispersed on many chromosomes. From these results and 
those obtained from cloned copies of some of these EcoRl 
fragments, we conclude that these other EcoRl fragments 
are probably processed pseudogenes. 

Mapping the human p-actin gene transcription unit. We~„ 
lo calized the mRNA cap (or initiation) site of the p -actin 
g ene first by Sl_nujgleas^_jna pping oW aud_then 6yT)N?r 
sequence comparisons. T he DNA fragment we used~fbr 
hybridization to human cellular RNA is an Xhol-BstW 
fragment, 127 bases long, extending from position -51 to 
base 76 in Fig. 4. The result of this experiment (Fig. 6) 
locates the region of the mRNA 5' termini to within four 
nucleotides. Since Sl-resistant DNA fragments migrate 
lower by a base on thin sequencing gels than do DNA 
fragments cleaved during chemical sequencing (12), the 
major Sl-resistant DNA fragment terminates with the base A 
at position 1. The other protected fragments could represent 
inexact digestion products or protection by minor transcripts 
initiating at other positions. 

Two consensussequj^^ ^£^1.9^- 
RNA-poEmeraseTI^omoters, TATA and CAAT, are found 
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fCCAGC ACCC CAAGGCGGCC AACGCCAAAA CTCTCCCTCC TCCTCTTCCT CAATNCTCGC TCTCGCTCTT TTTTTTTTTC GCAAAAGGAG GGGAGAGGGG GTAAAAAAAT GCTGCACTGT 
G6C GAAGCC GGTGAGTGAG CG6CGCGGGG CCAATCGCGT GCGCCGTTCC GAAAGTTGCC TTTTATGGCT CGAGCGGCCG CGGCGGCGCC CTATAAAACC CAGCGGCGCG ACGCGCCACC 
^ SACCCCaTCC G ^CGCGAO CAC^CCT CGCf TTTfifT «CGC CCG TCC A CAC CCGCCGCCAG GTAAGCCCG GCCAGCCGAC CGGGGCATGC GGCCGCGGCC 
cnCGCCCG TGCAGAGCCG CCGTCTGGGC CGCAGCGGGG GGCGCATGGG GGGGGAACCG GACCGCCGTG GGGGGCGCGG GAGAAGCCCC TGGGCCTCCG GAGATGGGGG ACACCCCACG 
CAGTTCGGA GGCGCGAGGC CGCGCTCGGG AGGCGCGCTC CGGGGGTGCC GCTCTCGGGG CGGGGGCAAC CGGCGGGGTC TTTGTCTGAG CCGGGCTCTT GCCAATGGGG ATCGCAGGGT 
5G6 CGCGGCG TAGCCCCCGC CAGGCCCGGT GGGGGCTGGG GCGCCATGC6 CGTGCGC6CT GGTCCTTTGG GCGCTAACTG CGTGCGCGCQ GGGAATTGGC GCTAATTGCG «CT«WDQ 
GGGACTCAAG GCGCTAATTG CGGCTGCGTT CTGGGGCCCG GGGTGCCGCG GCCQGGGCQG GGGCGAAGGC G6GCTCGGTC GGAAGGGGTG GGGTCGCCGC GGCTCCCGGG C G A 

CTTCCTGCCC GAGCCGCQGG CCGCCCGAGG GTGTGGCCGC TGCGTGCGCG CGCGCGACCC GGCGCTGTTT GAAQCGGGCG GAGGCGGGGC TGGCGCCCGG TTGGGAGGGG GTTGGGGCCT 
SGC TTCCTGC CGCGCGCCGC GGGGACGCCT CCGACjAGTg TTTGCCTTTT ATGGTAATAA CGCGCCGGCC CGGCTTCCTT TATCCCCAAT CGTGCGCGCG CCGGCGCCCC CTA G 
AAGG ACTCGG CGCGCCGGAA GTGGCCAGGG CGGGGGCGAC TTCGGCTCAC AGCGCGCCCG GCTATTCTCG CAG CTCACC ATG GAT JJT GAT ATC GCC GCG CTC GTC GAC 
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positio ns -29 and -89, respectively. Further upstream of 
the CAAT S equenceTlh5i~afe-1imjg stretches of both 
polypyrimidines and polypurines. Such sequences may con- 
tain SI nuclease-sensitive sites as found in similar regions in 
other genes (8). . , . 

The region surrounding the cap site is conserved in 
evolution (Fig. 7; see below), and our assignment ot the 5 
, terminus is consistent with that inferred from the alignment 
of human and rat sequences. This assignment is further 
strengthened by DNA sequence comparisons with the 5 



untranslated regions of two human p-actin pseudogenes (34). 
These introniess, reverse transcript-type pseudogenes are 
flanked by terminal repeats. The junction of these terminal 
repeats and the 5' untranslated regions is within 1 base pair 
of the SI nuclease-protected terminus (data not shown). 
Therefore it appears that these processed pseudogenes are 
full length. . __ TA , ori 

We conclude that the h uman ft-actin mRNA has an 
84-base5' UTR. Our longest f-actin cDNA cloTT^onmiii^ 
^TTbalTpairs of 5' UTR sequences (42); these are 
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identical to those of the corresponding 5' UTR of the p-actin 
gene from nucleotides 44 to 78 and 911 to 916 in Fig. 4. Thus 
the location of the IVS I can be unambiguously assigned to 
lie between bases 78 and 79 of the untranslated 5' leader 
sequence. Th is IVS I is the longest int roi^j^b'ase pairs in 
length. ~ ~~ " 

~" IVS II, III, IV, and V are located in the protein-coding 
region of the gene. The second intron is 134 base pairs in 
length and occurs between codons 41 and 42. The third 
intron is 441 base pairs in length and occurs between codons 
121 and 122. The fourth intron is 95 base pairs in length and 
interrupts codon 267. The fifth intron is 112 base pairs in 
length and occurs between codons 327 and 328. These intron 
positions are identical with those of rat (37) and chicken (22) 
p-actin genes. In addition, the relative lengths of the various 
introns appear to be conserved between humans and rats. 
IVS I and IVS III, for example, are the longest in all three 
species. Thus the structural organization of the p-actin gene 
is well preserved throughout vertebrate evolution. 

The human p-actin gene thus has six exons. The second 
exon contains 6 base pairs of 5' UTR and the first 41 codons. 
The other exons contain 80, 146, 61, and 48 codons respec- 
tively. All 376 codons are identical to those from our cDNA 
clone (42). Furthermore, this coding sequence is identical to 
those from another human p-actin cDN A clone derived from 
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FIG. 5. Detection of the functional p-actin gene in the human 
genome. (A) Human DNA (5 p,g) was digested with EcoRI, size 
fractionated on a 0.7% agarose gel, and blot transferred to a 
nitrocellulose filter. The filter was hybridized to the p-actin 3' UTR 
probe (pHF(3A-3'UT) and washed as described in Materials and 
Methods. The final wash was at 65°C in 0.5x SSC (lx SSC is 0.15 
M NaCl plus 0.015 M sodium citrate)-0.1% sodium dodecyl sulfate. 
The 14-kb genomic EcoKl fragment containing the functional p-actin 
gene is indicated by an arrow. (B) Human DNA (5 (xg) was digested 
with the following: 1, EcoRl; 2, Bglll; 3, HindlU; 4, Pstl; 5, Sad; 6, 
Xbal\ or 7, BaniHl. The DNA was then size fractionated on a 0.7% 
agarose gel and blot transferred to a nitrocellulose filter. The filter 
was hybridized to the IVS III subclone probe (MspI-SaulAl; Fig. 3) 
and washed as described in Materials and Methods. The final wash 
was at 65°C in 0.5 x SSC-0.1% sodium dodecyl sulfate. The IVS III 
probe has a higher G+C content (53%) than the 3' UTR probe (42%) 
and as such would detect more divergent genomic sequences than 
the latter probe under identical hybridization conditions. Size mark- 
ers (in kilobases) are indicted to the right of the autoradiogram. 




FIG. 6. SI mapping of the human p-actin mRNA 5' termini. The 
Xhol-BstNl* fragment, spanning the mRNA cap site (see the text), 
was used for SI analysis and chemical sequencing. Lanes 1 through 
3 shows sequencing ladders resulting from base-specific modifica- 
tion reactions (1, A+G cleavage fragments; 2, C-f-T cleavage 
fragments; 3, C cleavage fragments). The nucleotide sequence of 
this region, based on chemical sequencing with this and other DNA 
fragments (Fig. 4), is presented to the left of the autoradiogram. 
Lanes 4 through 6 show Sl-resistant fragments derived from the 
Xhol-BstNl* probe after hybridization with (lanes 4 and 5) or 
without (lane 6) human total cellular RNA prepared as described 
previously (41). 

mRNA isolated from a different tissue (19) and provides 
further evidence that this cloned gene is expressed in most, 
if not all, nonmuscle cells. 

The amino acid sequence of (3-actins from humans, rats, 
and chickens are identical (22, 37, 42). The codons are also 
highly conserved. The overall similarity of human and rat 
DNA sequences in the coding exons is 92%, and the overall 
similarity of human and chicken DNA sequences in the 
coding exons is 88%. 

There are 22 to 24 amino acid differences between mam- 
malian p-actin and various muscle actins (52). We found that 
the coding DNA sequences of the human p-actin gene were 
86% similar to those of human a-skeletal actin (18) and 78% 
similar to human a-cardiac actin (17). In addition, the 
similarity with the first seven exons of the aortic smooth 
muscle actin sequence (51) is 80%. Since the human p-actin 
sequences are more closely related to the rat and chicken 
p-actin sequences than they are to any of the human muscle 
actin sequences, we conclude that the actin gene duplication 
that engendered the muscle actins and the p-actin genes 
apparently occurred before the divergence of mammals and 
birds. 

Conservation of noncoding sequences: 5' flanking region. 
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-132 

CTCTC6CTCT TTTTTTTTTT CGCAAAAGGA GGGGAGAGGG GGTAAAAAAA Human 

f * **+ + * • * 

TTTTTTTTTT TTTTTTTTTT TGCAAAAG AAGGG GGTAAAAAAA Rat 

103 

-84 

TGCTGCACTG TC-GGCGAAG CCGGTGAGTG AGCGGCGCGG GGCCAATC-G Human 
* * *** • * 

TGCTGCACTG TCGGGCGAGG CCGGTGAGTG AGCGAGCCGG AGCCAATCAG Rat 

153 

-34 

CGTGCGCCGT TCCGAAAGTT GCCTTTTATG GCTCGAGCGG CCGCGGCGGC Human 



CGCCCGCCGT TCCGAAA-TT GCCTTTTATG GCTCGAGTGG CCGCTGTGGC 

202 



Rat 



GCCCTATAAA ACCCAGCGGC GCGACGCCGC ACCACCGCCG AGACCGCGTC Human 
0 - * ****** 

GTCCTATAAA ACCCGGCGGC GCAACGC-GC GCCACTGTCG AGTCCGCGTC Rat 

251 

65 

CGCCCCGCGA GCACAGAGCC TCGCCTTTGC CGATCCGCCG CCC— GTCCA Human 
** * •* * *•«• * * • * ** 

OACCCGCGA GTAC — AACC TCC TTGC AGCTCCTCCG TCCCGGTCCA Rat 

295 
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FIG. 7. Comparison of the 5 ' flanking region and the 5' UTR of 
the human p-actin gene with the rat p-actin gene. The sequences 
have been aligned for maximal homology. Mismatches and gaps are 
indicated by asterisks. The human sequence is numbered as in Fig. 
4, The rat sequence is numbered as in Nudel et al. (37). The rat 5' 
untranslated region (80 base pairs total) is between base 235 and 
1241, interrupted by the 927-base-pair 5' IVS (bases 309 to 1235). 

Kost et al. (22) reported the presence of DNA sequence 
similarity in the 5' flanking region of rat and chicken p-actin 
genes. We examined the extent of sequence similarity be- 
tween the p-actin genes of these two species and the human 
sequences. The similarity between human and rat sequences 
is extremely high (Fig. 7); there is also significant homology 
between human and chicken sequences (data not shown). 
We calculated the degree of sequence similarity of this 
region of the p-actin genes by applying the method of Miyata 
et al. (32), which yields a similarity value Krf\) that is 0 for 
identical sequences and approaches a terminal value of 0.75 
for random sequences. [KnQ) is the K N value calculated 
based on method one; i.e., when a gap is found in any one of 
the aligned sequences, the corresponding site is excluded 
from the calculation (32).] In this region, from the mRNA 
cap site to about 1?0 base pairs Upstream, the KM) value is 
0.098 for the human-rat comparison and 0.302 for the human- 
chicken comparison. The human-rat KM) value is among 
the lowest yet found in comparisons of any 5' flanking 
regions in mammals. This extraordinary conservation of 
DNA sequences in this 5' region may be a reflection of 
essential functions associated with the constitutively and 
ubiquitously active p-actin gene promoter. 

Conservation of noncoding sequences: 5' and 3' UTRs. We 
have reported the evolutionary conservation of the 3' UTR 
of the human p-actin cDNA (42). Since the 5' UTR of our 
P-actin cDNA clone was not full length, we had not been 
able to examine the sequence conservation of that region. 
We can now extend the sequence comparison between the 
human and rat 5' UTR sequences (Fig. 7). The KM) value 
for this region, 0.179, is very low. Therefore this 5' UTR is 



nearly as well conserved as the 3' UTR of human and rat 
p-actin genes [KM) value of 0.135 (42)] and the 5' UTR of 
human and rat skeletal actin genes [KM) value of 0.127 
(18)]. 

Conservation of noncoding sequences: introns. Our compar- 
isons of p-actin gene sequences have shown that they are 
exceptions to the general observation that UTR sequences 
are not well conserved. This finding, as well as the conser- 
vation of both the p-actin gene intron locations and their 
relative lengths, raised the possibility that the sequences of 
the introns might also be conserved. 

We aligned the nucleotide sequences of the human and rat 
IVS segments by using the Wilbur and Liprhan algorithm 
(56); There are strong sequence similarities in IVS I and IVS 
III (Fig. 8, 9, and 10). The IVS I alignments have a KM) 
value of 0.258 with a total of 392 base pairs matching and a 
minimal number of gaps inserted (Fig. 8). Although this 
KM) value is not as low as that for the 5' flanking region or 
the 3' UTR, it is highly significant when compared to values 
derived for other introns whose DNA sequences are known. 
For example, the large introns of the human and mouse 
p-globin genes which are 850 and 653 base pairs long, 
respectively, have a KM) value of 0.459 with 315 base pairs 
matching. Moreover, comparison of intron sequences of 
human insulin and metallothionein-I A genes with their mouse 
homologs also did not reveal significant similarities [KM) 
values of at least 0.4]. In contrast, the introns of human and 
mouse protooncogene c-fos are more conserved than the 5' 
flanking regions (53). However, since the conserved introns 
of p-actin and c-fos genes are dissimilar, it appears unlikely 
that they are directly involved in the regulation of p-actin 
and c-fos gene transcription by growth factors (5, 13). 

Conservation of noncoding sequences: enhancer-like and 
potential Z-DNA sequences in IVS I. Within the conserved 
segment of IVS I is a 30-base-pair subsequence that is highly 
conserved from humans to chickens. This sequence, located 
673 base pairs from the 5' end and 129 base pairs from the 3' 
end of the intron (underlined in Fig. 4), is shown in Fig. 9. 
There are only two mismatches, both of them pyrimidine 
changes, between the human and rat sequences. There are 
five additional mismatches between the human and chicken 
sequences, but the KM) value of this segment is still low 
(0.233). Moreover, in ail three species the location of this 
sequence is conserved relative to the ends of the intron 
(legend to Fig. 9). Of interest is that this 30-base-pair 
segment contains sequences similar to those associated with 
viral enhancer elements. The simian virus 40 72-base repeat 
contains the sequence TGTGGAAA (24), and the more 
active murine sarcoma virus promoter-distal 73-base-pair 
repeat contains the sequence TGTGGTAA (23). The human, 
rat, and chicken p-actin introns all contain the sequence 
TATGGTAA within the 30-base-pair conserved region. 

The IVS I sequences of human, rat, and chicken p-actin 
also contain several potential Z-DNA sequences (36). There 
are four short elements in both the human and rat genes 
(underlined in Fig. 8). The locations of three of these four 
elements are conserved between humans and rats. Short 
potential Z-DNA elements appear to be part of the simian 
virus 40 enhancers and of retroviral long terminal repeats 
(36). The potential Z-DNA elements in these p-actin introns 
might equally well function in concert with the 30-base-pair 
conserved region to mediate enhanced transcription. Dem- 
onstration that this 30-base-pair conserved region or the 
potential Z-DNA sequences have enhahcer-like activities 
must await direct experimentation. 

Neither this 30-base-pair sequence nor sequences related 
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GGGTCTTTGTCTGAGCCGGGCTCTTGCCAATGGGGATCGCAGGGTG6GCGCGGCGTAG Human 
** • **** * * • * * ***** ** 

GGGTCTTTGTCCAAACCGGTT--TTGCCATTCGGCTTGGC GGGCGCGGCGGGG Rat 

721 

427 

CCCCCGCCAGGCCCGGTGGGGGCTGGGGCGCCA-I£££££T^£G£££TGGTCCTTTGG Human 
•** ** • • • * ••** •* 

CC GCTCGGCCGGGTGGGGGCTGGGATGCCATTGCGCGTGCGCGCTCTATCACTGG Rat 

776 

481 

GCGCTAAC TGCGTGCGCGCO GGGAATTGGCGCTAATTGCGGCTGCGGCCQGG GA Human 

mm ***** * ***** * ** * **** ***• ******* 

GCATTGGGGC CGTGCGCGC TGG- J GGAGGGAACTCTTCCTCTCCCCCTCTTCCGA Rat 

829 



Rat 



Chiek 
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539 

CTCAAGGCGCTAATTGCGGCTGCGTTCTGGGGCCCGGGGTGCCGCGGCCQGGGCQGGG Human 

* * ******* ** *****••••* ** *** 

GTTAAGAG T TGCGCGTGCGT ATTGAGACTAGGAGCGCGGCCGCCCCGGGTTGG Rat 

882 

594 

GCGAAGGCGGGC--TCGGTCGGAAGGGGTGGGGTCGCCGCGGCTCCCGGGCGCTTGC- Human 
* ** ••* • *• **** • * 

GCGAGGGCGGGCCGTCCACCGGAAGGGGCGGGGTCGTAGCGGCTA GGCGCCTGCT Rat 

937 

651 

-GCACTTCCTGCCCGAGCCGCQGGCCGCCCGAGGGTGTGGCC6CIGCGJ^£&gfi£££fi. Human 

• • *»*♦♦♦*»»****•****** ***** * 

CGCGCTTCCTGCT GGGT6TGGTCGCCTCCCGC0C6C6C A Rat 

976 

700 

££ A CCCGGCGC- - -TGTTTGAAQCGGGCGGAGGCGGGGCTGGCGCCCGGTTG Human 

* *•** * ***** ************* * •* • 

CTAGCCGCCCGTCGCCTCAGTGT AGGCGGGCCTGT-GCCCGTTTG Rat 

1020 

756 

GGAGGGGGTTGGGGCCTGGCTTCCTGCCGCGCGCCGCGGGGACGCCTCCGA-[CCAGT Human 
mm *• * ******** *• ** * 

GGGAGGGGGGGAGGCCTGGCTTCCTGCCGT GGGTCCGCCTCCGGG[CCAGC Rat 

1070 

811 

GTTTGCCTTTTATGGTAATAACGCG]CCGGCCCGGC--TTCCTTTATCCCCAATQfiI£. Human 
• ****** * ******* 

GTTTGCCTTTTATGGTAATAATGCG]GCTGTCCTGCGCTTCCTTTGTCCCCTGA Rat 

1123 

CGCGCGC CGGCGC-CCCCf AGCGGCCTAAGGACTCGGCGCGCCGGAAGTGGCCAGGGC Human 
*** *** * * *** • • • * * 

GCTTGGGCGCGCCCCTGGCGGCTCGAGGCCGCGGCTCGCCGGAAGTGGGCAGG-C Rat 

1177 

910 

GGGGGCGACTTCGGCTCACAGCGCGC — CCGGCTATTCTCGC AG Human 

** • * *** •* * * ****••**•*• ** * * ** ***** 

GGCAGCGGCTGCTCTTGGCGGCTCGCGGTGACCATAGCCCT CTTTTGTGCCTTGATAG Rat 

1235 

FIG. 8. Comparison of IVS I sequences of the human p-actin gene with the rat 0-actin gene. The sequences have been aligned for ma j£ ma J 
homology. Mismatches and gaps are indicated by asterisks. There are 235 base pairs (in humans) and 362 base pairs (in rats) of IVb 1 
sequences that do not align well with each other between the 5' splice sites and the conserved regions shown in this figure. The human 
sequence is numbered as in Fig. 4. The rat sequence is numbered as in Nudel et al. (37). The potential Z-DNA-formirtg elements are 
underlined The 30-base-pair highly conserved sequence (see Fig. 9) is enclosed in brackets. The K^l) value for the $' subregion (293 base 
pairs; bases 313 to 605) is 0.284. The KrfX) value for the 3' subregion (216 base pairs; bases 679 to 894) is 0.220. These two subregions are 
separated by a middle subregion with two long gaps of more than 10 nonmatching bases each. The latter region consists of 72 base pairs in 
humans (32 base pairs nonmatching with rats) and 49 base pairs in rats (9 base pairs nonmatching with humans). When comparing rat and 
chicken IVS I sequences (data not shown), we obtained Kn(1) values of 0.466 and 0.347 for the 5' and 3' subregions, respectively. 
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CCAGTGTTTGCCTTTTAIfifilAATAACGCG 

• * 
CCAGCGTTTGCCTTTIATfiSIMTAATGCG 
* ** ** 
GCAGCCATTGCCTTTIATfiGTAATCGTGCG 



781 



1095 



-293 



FIG 9. A 30-base pair conserved region within p-actin gene IVS 
rf^mence Mismatches are indicated by asterisks. The human 
Lffi^snSKSd as in Fig. 4. There are 673 and 129 £se pans 

Tntervening sequences 5' and 3' to this segment m he human 
intron. The rat sequence is numbered as in Nudei et al. (37) There 

re 757 and 140 base pairs of intervening sequences 5' and 3 to this 

ament in the rat intron. the chicken sequence is numbered as in 
Sg et al (22). There are 587 and 286 base pairs of intervening 
5m ences 5' and 3' to this segment in the chicken intron. The 
Sbase sequences resembling viral enhancer elements are 
underlined. 



to it are found in the other four introns. In fact, the longest 
sequences common to the five introns are only 4 nucleotides 
long. This observation and the fact that the sequences in the 
three smallest (3-actin gene introns (IVS II, IV, and V) are 
fully diverged from their rat and chicken counterparts sug- 
gest that they contain no selected sequences 

Conservation of noncodmg sequences: IVS III. IVS III, tne 
second-longest intron of the p-actin gene, is also highly 
conserved in evolution, however. About 73% of its base 
pairs match those of IVS III of the rat (Fig. 10). The 2^(1) 
value for this region, when compared with that for the rat, is 
0 223 There is a 68-base-pair subsegment that is conserved 
between humans, rats and chickens (underlined in Fig. 10). 
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QC ■CCTTTCTCACTGGTia^^ 
AC6CCCTTCTCTAATT6TCTTTC 

1595 

gtctcccttggaactttgcagtttct gctctttcccagatgaggtcttttt 

1641 

TTTGTCTC TCTGACTAGGTGTC-TAAG ACAGTGTTGTGGGTGTAGGT Human 

1 ********* ** ****** * 



TTTCTCT 



CGATCGCCTTTCTGACTAGGTGTTTTAAACCCCTACAGTGCTGTGGGTGTAGGT 



1913 
1701 



ACTAACACTGGCTCGTGTGA£A£rtaGC^ 

ACTAACAATGGCTCGTGTGACAAAGCTAATGAGGCTGGTGATAAATGGCCTTGGAGTGTGT 



1795 



* * ***** ** 



GCCGTGTTCTTTGCACTTTCTGCATGTCCCCCGT ----- 

# m * * * * * * ************** 

acttIgctgtgttctt-gccctctttgcatgtctcactcaaatctatccttacagtctcac 

1854 

ctggcctggctgtccccagtggcttccccagtgtgac-atggtgcat 

CtGCCCTGAGTGTTTCTTGTGGCTTTAGGAGCTTGACAATACTGTATTGCTTTCTCTACAG 



Human 
Rat 

Human 
Rat 



Rat 

Human 
Rat 



1761 

-CCCAGCACACTTA Human 



ATTAAGTAGGCGCACAGTAGGTCTGAACAGACTCCCCATCCCAAGAC- 
ATTCAGTAGATG-ACAGTAGGTCTAAATGGAGCCCCTGTCCTGATACTCCCAGCACACTTA Rat 



Human 
Rat 

Human 
Rat 



FIG. 10. Comparison of IVS licences of ^ human ^^^JKS — S^t^ES** 
maximal homology. Mismatch^ and indicated I ^^^^ lo ^ TOIUMlching gaps . (from human base number 1437 

^Ss — °"- ^ C ° nSerVed ^ Symm6try 

element, CAAGGCC-N 18 -GGCCTTG, is also underlined. 
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Overall, the evolutionary conservation of this region is even 
higher than that of IVS I. However, since this conserved 
intron is about 1.5 kb from the 5' promoter region(s), it is less 
likely to have a major role in the regulation of actin gene 
expression. Nevertheless, such strong sequence conserva- 
tion implies selection for some unknown function. There are 
several unusual features in IVS III that might be clues to 
such functions. First, this region has a G+C content of only 
55% whereas the G+ C content of IVS I is 76%. Second, IVS 
III is composed of 61% pyrimidines with four runs of 10 or 
more polypyrimidines. Third, there are several sequences in 
this intron with dyad symmetry. One of these, CAAGGCC- 
N 18 -GGCCTTG in humans (underlined in Fig. 10) has 12 
nucleotides out of 14 conserved between humans and rats. A 
similar sequence with dyad symmetry (GGGCCTG-N19- 
CAGGCTC) is found in the chicken p-actin IVS III. 

DISCUSSION 

p- Actin is the product of a single functional gene. Since . 
there are 20 or more DNA segments in the human genome 
that contain p-actin sequences, it was important to establish 
that the p-actin gene described here is the major, if not sole, 
functioning human p-actin gene. Several lines of evidence 
strongly support this conclusion. First, the mRNA encoded 
by this gene is identical to that of cDNAs cloned from a 
human fibroblast library (42) and from a human epidermal 
cell library (19). Thus this gene encodes the major p-actin 
mRNA in two human cell types. 

Second, an allele of this gene can be expressed after 
transfection into mammalian cells. We have cloned and 
partially sequenced the two alleles, both being expressed, of 
the p-actin gene from the human HuT-14T cell line (26; C.-S. 
Lin, S.-Y. Ng, P. Gunning, L. Kedes, and J. Leavitt, Proc. 
Natl. Acad. Sci. USA, in press). One of the alleles 
contains a codon-altering point mutation at codon 244, and 
the other allele is identical to the p-actin gene reported here. 
These two genes were cloned by screening genomic libraries 
ofHuT cells with a 5' IVS I probe from our human p-actin 
gene. Since the IVS I is single copy, we conclude that the 
wild-type p-actin gene and the mutant gene(s) must be allelic 
copies. The mutant p-actin allele has been transfected into 
both human and rat-2 cells [J. Leavitt, P. Gunning, L. 
Kedes, and R. Jariwaila, Nature (London), in press], where 
it robustly expresses mutant p-actin. In addition, in blotting 
experiments with the RNA from the transfected rat-2 cells, 
we detect only a single discrete 2.1-kb human p-actin tran- 
script. We conclude that the mutant p-actin allele and thus 
the nonmutant p-actin gene described in this manuscript are 
functional in human cells and are expressed under the 
regulation of a strong promoter. 

The data presented in this paper strongly argue that, other 
than the functional actin gene, all of the other p-actin gene 
sequences detected by the human p-actin 3' UTR probe are 
pseudogenes. The demonstration that ail of the human 
P-actin gene sequences are more closely related to the 
functional human gene than they are to any of the p-actin 
genes of the mouse indicates that these human sequences 
were generated recently in evolution, but certainly after the 
divergence of mice and humans. These other human p-actin 
DNA fragments fail to hybridize with probes containing 
either of two p-actin intervening sequences. This implies 
that these other genes are intronless and, accordingly, are 
likely to be processed pseudogenes of the reverse transcript 
type. Furthermore, the nonlinkage of these gene sequences 
indicates that they are dispersed randomly throughout the 
genome and that tandem gene duplication has had little to do 
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with their generation. In addition, direct sequence analysis 
of several of these gene sequences has confirmed that they 
are indeed intronless pseudogenes of the reverse transcript 
type (33, 34). We conclude that the human p-actin 3' UTR 
probe, Hp AS, defines a family of related gene sequences of 
which one is a functional gene and the remainder are 
intronless pseudogenes. 

On the origin and dispersion of processed pseudogenes. We 
have shown by chromosome mapping that there is but a 
single copy of an expressed p-actin gene in the human 
genome and that the 19 (or more) p-actin-related sequences 
are not closely linked to this p-actin gene or to each other. 
From the data discussed earlier, it is highly likely that the 
human p-actin multigene family probably consists of a single 
expressed gene and at least 19 processed pseudogenes. The 
fact that a single, functional, p-actin gene, found on chro- 
mosome 7, is not Closely linked with any of its related 
sequences is one of the first demonstrations in a multigene 
family of nonlinkage between an expressed gene and its 
related processed pseudogenes. This finding implies that the 
integration of processed pseudogenes in the human genome 
is probably random. Furthermore, the lack of detectable 
linkage of any of these p-actin sequences to each other 
indicates that tandem duplication has not been involved in 
the expansion of this family. 

Based on our previous observation that the cytoskeletal 
actin genes, but not the sarcomeric actin genes, are associ- 
ated with pseudogene families, we proposed the hypothesis 
that linked the expression of a gene in germ line cells to the 
production of large processed pseudogene families (41). In 
keeping with this hypothesis, other multigene families that 
consist predominantly of dispersed processed pseudogenes 
also appear to be expressed in germ cells. Examples include 
the multigene families encoding arginosuccinate synthetase 
(9), dihydrofolate reductase (1), glyceraldehyde 3-phosphate 
dehydrogenase (40), metallothionein (21), a-tubulin (28), and 
p-tubulin (27). It will be interesting to determine whether the 
level of mRN As of a particular gene in germ cells correlates 
with the abundance of processed pseudogenes in the 
genome. 

Use of multigene families for chromosome mapping. One 

unexpected outcome of this work was the relative ease and 
efficiency we observed in the mapping of a number of related 
DNA restriction fragments to specific chromosomes. We 
had initially expected that we would have to develop a 
gene-specific probe for each p-actin genomic DNA fragment 
(such as a flanking region probe) to distinguish restriction 
fragments in somatic cell hybrids. What we discovered was 
that selection of a species-specific probe allowed us to detect 
as many as 20 human-specific EcoRl fragments and to assign 
at least six of these fragments to specific chromosomes. For 
example, the mapping of ACTB to human chromosome 7 can 
be accomplished by the use of the Hp AS probe alone. Thus 
small multigene families appear to provide a general ap- 
proach for simultaneously mapping a number of restriction 
fragments from the same set of somatic cell hybrids. Fur- 
thermore, this approach is facilitated by providing multiple 
sets of data from the same genomic Southern blots. For 
example, we should be able to map members of a second 
multigene family (such as 7-actin) with these same blots by 
using a species-specific 3' UTR probe. By comparing the two 
sets of data, we will also be able to determine whether 
various p- and 7-actin sequences cosegregate. 

Knowledge of the chromosomal assignments of a set of 
dispersed restriction fragments, such as those generated 
from an analysis of the p- and 7-actin multigene families, will 
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have an important application for human chromosome map- 
ping, or mapping of chromosomes for any species, when 
applied to somatic cell hybrids. We expect that at least half 
of the human chromosomes will contain either a (3- or -v-actin 
2 ene sequence that can be readily visualized on a genomic 
Southern blot (Fig. 2). These loci could therefore serve as 
markers for specific human chromosomes in human-rodent 
somatic cell hybrids. Thus probes from p- and 7-actin, or a 
probe from any dispersed, moderately small multigene fam- 
ily may provide a more convenient means of evaluating, at 
least initially, newly generated cell lines than conventional 
karyotyping or analysis of biochemical markers. 

Evolution of intron segments. Eucaryotic actin gene 
isotypes share subsets of 15 known intron positions (46, 51), 
but no simple, parsimonious scheme can account for both 
the evolution of the various isotypes and the intron locations 
(16) Each gene has zero to seven introns. Therefore it has 
been proposed that different introns have been excised (or 
inserted) during evolution of various actin genes found in 
present-day eucaryotes (10, 51). 

Mammalian sarcomeric (a-skeletal and a-cardiac) actin 
senes have two intron positions not found in the cytoskeleta 
|-actin genes (IVS III and IVS IV), and the cytoskeletal 
B-actin gene has a single unique intron position (IVS III) not 
found in a-skeletal and a-cardiac actin genes. We have 
shown here that the 0-actin IVS III intron segment is 
evolutionarily well conserved. Similarly, the IVS III ot rat 
and chicken a-skeletal actin genes is also conserved [the 
KffX) value is 0.235 with 56 base pairs matching]. The 
preservation of these unique aspects of the sarcomeric 
versus the cytoskeletal actin gene organization throughout 
vertebrate evolution suggests that some intron segments 
have become functional domains of the respective transcrip- 
tion unit. Determination of the intron locations and structure 
of the other vertebrate cytoskeletal actin gene, 7-actin gene, 
may help us understand the evolution of intervening se- 
quences. 

Segments of genes can evolve at different rates, bor 
example, we previously have reported evidence for separate 
units of selection within the 3' UTRs of mammalian a- 
skeletal and a-cardiac mRNAs (14). The 3' half of these 3 
UTRs shows stronger evolutionary conservation than the 
respective 5' half. Similarly, IVS I, located in the 5' UTR, 
has a nonconserved block, 235 base pairs in humans and 362 
base pairs in rats, upstream of the conserved regions shown 
in Fig. 8. On the other hand, the entire 3' UTR of the 0-actin 
gene is uniformly conserved and appears to evolve as a 
single unit (42). Likewise, IVS III, the most highly con- 
served intron, appears to have been selected as a single unit, 
since the nucleotide differences between it and its rat 
homolog are randomly distributed. 

Conserved intron segments. Our finding of highly con- 
served blocks of nucleotides in two of the five intervening 
sequences of p-actin genes raises the possibility that these 
segments have regulatory functions. Conserved internal 
regions have been reported previously, such as the internal 
transcriptional enhancer regions of immunoglobulin genes 
(2, 11, 39, 44). However, the locations of these enhancers 
were initially regarded as a peculiarity of the immunoglob- 
ulin gene loci. More recently, internal control regions have 
been detected (but yet undefined) for the adenovirus E1A 
gene (38), human globin genes (3), and chicken thymidine 
kinase gene (31). Any conclusion that the conserved p-actin 
intron sequences, especially those of IVS I, function as 
transcriptional enhancers must await direct experimenta- 
tion. Nevertheless the evolutionary conservation of the 



immunoglobulin enhancer segments (6, 20) indicates that 
other transcriptional enhancers or c/s-acting regulatory sig- 
nals would be under selective pressure. It is interesting to 
note in this regard that the IVS I of both a- and 0-globm 
genes are the most conserved introns of these genes. The 
IVS I of the human and mouse (3-globin genes, for example, 
has 81 base pairs matching to give a K^il) value of 0.302. 
Therefore these introns may well contain part of the pro- 
posed downstream regulatory elements (3). 
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